1 RQ: What are the characteristics of student list purchases?

1.0.1 OOS Order Category: By MSA

3 of the 21 OOS orders were by MSA

  • 2 orders in 2019
  • 1 order in 2020

They made 2 “OOS Regional MSA” orders 1 “Regional Counselor MSAs” order.

  • They used the segment analysis from above
  • A+/B- high school GPA
  • MSAs from states: CA, TX, GA, FL, NY, NJ

2 RQ: What are the characteristics of students/schools included vs. not included in purchased lists?

General observations

  • Of 121,110 students purchased between 2017-2020, 52,788 reside in California. About 44% of student purchased (*assuming students weren’t purchased more than once).

  • Of 52,788 students purchased in California, 19,071 were purchased in the LA metro area alone (36%).

2.1 Zip code investigations

After merging CEEB and NCES data, there were 45,172 students purchased in California, 18,417 had zip codes that did not match the zip code for the high school they attended. Perhaps wehave zip codes of the home addresses of students purchased?

  • If we use zip code of students’ households, then we can get more precise information about them at the zip code-level (e.g., median household income, race/ethnicity, education level).
  • If we use zip code of the high school, we can use secondary high school data about performance and racial/ethnic composition of the high school.
  • Perhaps we could use both?

2.2 Out-of-State, zip codes purchased in LA metro area vs. not purchased

  • If we select on the Number of students purchased button, we can see that more students are visted in zip codes with a smaller percentage of Black, Latinx, and Native American population and zip codes with overall higher median household income.
  • Notice that areas shaded in gray (NA) are zip codes where no students were purchased (~68 zip codes). Similarly, these zip codes tend to be home to more Black, Latinx, and Native American people and have a lower median household income.

2.3 RQ: What are the characteristics of students included vs. not included in purchased lists in the LA metro area?

All “non-purchased” zipcodes included below are zip codes that make up the LA metro area that were not purchased.

  • The zip code from the student lists was used here. In other words, zip code represents the zip code associated with students’ address.

    • Binary version of zip_purchased is problematic because of variation in number of students purchased by zip
acs_race_zipcode_grouped %>%
  ggplot(aes(x=as.factor(zip_purchased), y=median_household_income, color=zip_purchased)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Non-purchased vs. Purchased zip codes",
       y = "Mean median household income") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_grouped %>%
  ggplot(aes(x=as.factor(zip_purchased), y=pop_white_15_19_pct, color=zip_purchased)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Non-purchased vs. Purchased zip codes",
       y = "Mean percent White") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_grouped %>%
  ggplot(aes(x=as.factor(zip_purchased), y=pop_asian_15_19_pct, color=zip_purchased)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Non-purchased vs. Purchased zip codes",
       y = "Mean percent Asian") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_grouped %>%
  ggplot(aes(x=as.factor(zip_purchased), y=pop_hispanic_15_19_pct, color=zip_purchased)) +
  geom_bar(stat="identity", fill="white") + 
  labs(x = "Non-purchased vs. Purchased zip codes",
       y = "Mean percent Hispanic") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_grouped %>%
  ggplot(aes(x=as.factor(zip_purchased), y=pop_black_15_19_pct, color=zip_purchased)) +
  geom_bar(stat="identity", fill="white") + 
  labs(x = "Non-purchased vs. Purchased zip codes",
       y = "Mean percent Black") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

Large variation in the number of students purchased by zip codes (student-level zip code)

acs_race_zipcode_groupedv2 %>%
  ggplot(aes(x=as.factor(purchased_num_cat), y=median_household_income, color=purchased_num_cat)) +
  geom_bar(stat="identity", fill="white") + 
  labs(x = "Number of students purchased by zip code",
       y = "Mean median income") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_groupedv2 %>%
  ggplot(aes(x=as.factor(purchased_num_cat), y=pop_white_15_19_pct, color=purchased_num_cat)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Number of students purchased by zip code",
       y = "Mean percent White") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_groupedv2 %>%
  ggplot(aes(x=as.factor(purchased_num_cat), y=pop_asian_15_19_pct, color=purchased_num_cat)) +
  geom_bar(stat="identity", fill="white") +
  labs(x = "Number of students purchased by zip code",
       y = "Mean percent Asian") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_groupedv2 %>%
  ggplot(aes(x=as.factor(purchased_num_cat), y=pop_hispanic_15_19_pct, color=purchased_num_cat)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Number of students purchased by zip code",
       y = "Mean percent Hispanic") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_groupedv2 %>%
  ggplot(aes(x=as.factor(purchased_num_cat), y=pop_black_15_19_pct, color=purchased_num_cat)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Number of students purchased by zip code",
       y = "Mean percent Black") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

The table below sorts the number of students purchased by zip code in descending order for LA msa.

acs_zip_pur_la

The table below displays median income and race/ethicity characteristics of zip codes that were not purchased in the LA msa area.

acs_zip_npur_la

2.4 RQ: What are the characteristics of students included vs. not included in purchased lists in the NY metro area?

  • Similar to LA msa, I am using the zip code associated with the student list zip code (students’ address).

  • All “non-purchased” zipcodes included below are zip codes in the NY metro area where a student was not purchased.

  • The zip code from the student lists data set was used here. In other words, zip code represents the zip code associated with students’ address.

    • Binary version of zip_purchased is problematic because of variation in number of students purchased by zip
acs_race_zipcode_grouped %>%
  ggplot(aes(x=as.factor(zip_purchased), y=median_household_income, color=zip_purchased)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Non-purchased vs. Purchased zip codes",
       y = "Mean median income") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_grouped %>%
  ggplot(aes(x=as.factor(zip_purchased), y=pop_white_15_19_pct, color=zip_purchased)) +
  geom_bar(stat="identity",fill="white") +  
  labs(x = "Non-purchased vs. Purchased zip codes",
       y = "Mean percent White") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none")  

acs_race_zipcode_grouped %>%
  ggplot(aes(x=as.factor(zip_purchased), y=pop_asian_15_19_pct, color=zip_purchased)) +
  geom_bar(stat="identity",fill="white") +  
  labs(x = "Non-purchased vs. Purchased zip codes",
       y = "Mean percent Asian") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_grouped %>%
  ggplot(aes(x=as.factor(zip_purchased), y=pop_hispanic_15_19_pct, color=zip_purchased)) +
  geom_bar(stat="identity",fill="white") +  
  labs(x = "Non-purchased vs. Purchased zip codes",
       y = "Mean percent Hispanic") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_grouped %>%
  ggplot(aes(x=as.factor(zip_purchased), y=pop_black_15_19_pct, color=zip_purchased)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Non-purchased vs. Purchased zip codes",
       y = "Mean percent Black") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

Following the same logic and plotting number of students purchased by zip code.

Large variation in the number of students purchased by zip codes (student-level zip code)

acs_race_zipcode_groupedv2 %>%
  ggplot(aes(x=as.factor(purchased_num_cat), y=median_household_income, color=purchased_num_cat)) +
  geom_bar(stat="identity", fill="white") +  
   labs(x = "Number of students purchased by zip code",
       y = "Mean median income") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_groupedv2 %>%
  ggplot(aes(x=as.factor(purchased_num_cat), y=pop_white_15_19_pct, color=purchased_num_cat)) +
  geom_bar(stat="identity", fill="white") +  
   labs(x = "Number of students purchased by zip code",
       y = "Mean percent White") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_groupedv2 %>%
  ggplot(aes(x=as.factor(purchased_num_cat), y=pop_asian_15_19_pct, color=purchased_num_cat)) +
  geom_bar(stat="identity", fill="white") +  
   labs(x = "Number of students purchased by zip code",
       y = "Mean percent Asian") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_groupedv2 %>%
  ggplot(aes(x=as.factor(purchased_num_cat), y=pop_hispanic_15_19_pct, color=purchased_num_cat)) +
  geom_bar(stat="identity", fill="white") +  
   labs(x = "Number of students purchased by zip code",
       y = "Mean percent Hispanic") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_groupedv2 %>%
  ggplot(aes(x=as.factor(purchased_num_cat), y=pop_black_15_19_pct, color=purchased_num_cat)) +
  geom_bar(stat="identity", fill="white") +  
   labs(x = "Number of students purchased by zip code",
       y = "Mean percent Black") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

The table below sorts the number of students purchased by zip code in descending order.

acs_zip_pur_ny

The table below displays median income and race/ethnicity characteristics of zip codes that were not purchased in the NY metro area.

acs_zip_npur_ny

2.5 RQ: What are the characteristics of students included vs. not included in purchased lists in the Austin-Round Rock-San Marcos metro area?

  • Similar to LA & NY msa, I am using the zip code associated with the student list zip code (students’ address).

  • All “non-purchased” zipcodes included below are zip codes in the Austin metro area where a student was not purchased.

  • The zip code from the student lists was used here. In other words, zip code represents the zip code associated with students’ address.

    • Binary version of zip_purchased is problematic because of variation in number of students purchased by zip
acs_race_zipcode_grouped %>%
  ggplot(aes(x=as.factor(zip_purchased), y=median_household_income, color=zip_purchased)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Non-purchased vs. Purchased zip codes",
       y = "Mean median income") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_grouped %>%
  ggplot(aes(x=as.factor(zip_purchased), y=pop_white_15_19_pct, color=zip_purchased)) +
  geom_bar(stat="identity",fill="white") +  
  labs(x = "Non-purchased vs. Purchased zip codes",
       y = "Mean percent White") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none")  

acs_race_zipcode_grouped %>%
  ggplot(aes(x=as.factor(zip_purchased), y=pop_asian_15_19_pct, color=zip_purchased)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Non-purchased vs. Purchased zip codes",
       y = "Mean percent Asian") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_grouped %>%
  ggplot(aes(x=as.factor(zip_purchased), y=pop_hispanic_15_19_pct, color=zip_purchased)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Non-purchased vs. Purchased zip codes",
       y = "Mean percent Hispanic") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_grouped %>%
  ggplot(aes(x=as.factor(zip_purchased), y=pop_black_15_19_pct, color=zip_purchased)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Non-purchased vs. Purchased zip codes",
       y = "Mean percent Black") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none")  

Following the same logic and plotting number of students purchased by their zip codes.

Large variation in the number of students purchased by zip codes (student-level zip code)

acs_race_zipcode_groupedv2 %>%
  ggplot(aes(x=as.factor(purchased_num_cat), y=median_household_income, color=purchased_num_cat)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Number of students purchased by zip code",
       y = "Mean median income") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_groupedv2 %>%
  ggplot(aes(x=as.factor(purchased_num_cat), y=pop_white_15_19_pct, color=purchased_num_cat)) +
  geom_bar(stat="identity",fill="white") +  
  labs(x = "Number of students purchased by zip code",
       y = "Mean percent White") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_groupedv2 %>%
  ggplot(aes(x=as.factor(purchased_num_cat), y=pop_asian_15_19_pct, color=purchased_num_cat)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Number of students purchased by zip code",
       y = "Mean percent Asian") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none")  

acs_race_zipcode_groupedv2 %>%
  ggplot(aes(x=as.factor(purchased_num_cat), y=pop_hispanic_15_19_pct, color=purchased_num_cat)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Number of students purchased by zip code",
       y = "Mean percent Hispanic") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_groupedv2 %>%
  ggplot(aes(x=as.factor(purchased_num_cat), y=pop_black_15_19_pct, color=purchased_num_cat)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Number of students purchased by zip code",
       y = "Mean percent Black") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

The table below sorts in descending order the number of students purchased by zip code.

acs_zip_pur_austin 

The table below displays median income and race/ethnicity characteristics of zipcodes that were not purchased in the Austin metro area.

acs_zip_npur_austin

2.6 RQ: What are the characteristics of students included vs. not included in purchased lists in the California?

  • Similar to LA msa, I am using the zip code associated with the student list zip code (students’ address).

  • All “non-purchased” zipcodes included below are zip codes in the California where a student was not purchased.

  • The zip code from the student lists was used here. In other words, zip code represents the zip code associated with students’ address.

    • Binary version of zip_purchased is problematic because of variation in number of students purchased by zip
acs_race_zipcode_grouped %>%
  ggplot(aes(x=as.factor(zip_purchased), y=median_household_income, color=zip_purchased)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Non-purchased vs. Purchased zip codes",
       y = "Mean median income") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_grouped %>%
  ggplot(aes(x=as.factor(zip_purchased), y=pop_white_15_19_pct, color=zip_purchased)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Non-purchased vs. Purchased zip codes",
       y = "Mean percent White") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_grouped %>%
  ggplot(aes(x=as.factor(zip_purchased), y=pop_asian_15_19_pct, color=zip_purchased)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Non-purchased vs. Purchased zip codes",
       y = "Mean percent Asian") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none")  

acs_race_zipcode_grouped %>%
  ggplot(aes(x=as.factor(zip_purchased), y=pop_hispanic_15_19_pct, color=zip_purchased)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Non-purchased vs. Purchased zip codes",
       y = "Mean percent Hispanic") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none")  

acs_race_zipcode_grouped %>%
  ggplot(aes(x=as.factor(zip_purchased), y=pop_black_15_19_pct, color=zip_purchased)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Non-purchased vs. Purchased zip codes",
       y = "Mean percent Black") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

Following the same logic and plotting number of students purchased by their zip codes.

Large variation in the number of students purchased by zip codes (student-level zip code)

acs_race_zipcode_groupedv2 %>%
  ggplot(aes(x=as.factor(purchased_num_cat), y=median_household_income, color=purchased_num_cat)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Number of students purchased by zip code",
       y = "Mean median income") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

acs_race_zipcode_groupedv2 %>%
  ggplot(aes(x=as.factor(purchased_num_cat), y=pop_white_15_19_pct, color=purchased_num_cat)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Number of students purchased by zip code",
       y = "Mean percent White") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none")  

acs_race_zipcode_groupedv2 %>%
  ggplot(aes(x=as.factor(purchased_num_cat), y=pop_asian_15_19_pct, color=purchased_num_cat)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Number of students purchased by zip code",
       y = "Mean percent Asian") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none")  

acs_race_zipcode_groupedv2 %>%
  ggplot(aes(x=as.factor(purchased_num_cat), y=pop_hispanic_15_19_pct, color=purchased_num_cat)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Number of students purchased by zip code",
       y = "Mean percent Hispanic") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none")  

acs_race_zipcode_groupedv2 %>%
  ggplot(aes(x=as.factor(purchased_num_cat), y=pop_black_15_19_pct, color=purchased_num_cat)) +
  geom_bar(stat="identity", fill="white") +  
  labs(x = "Number of students purchased by zip code",
       y = "Mean percent Black") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") 

The table below sorts in descending order the number of students purchased by zip code.

acs_zip_pur_ca

The table below displays median income and race/ethnicity characteristics of zipcodes that were not purchased in California.

acs_zip_npur_ca